initial code
VisCoder2: Building Multi-Language Visualization Coding Agents
Ni, Yuansheng, Cai, Songcheng, Chen, Xiangchao, Liang, Jiarong, Lyu, Zhiheng, Deng, Jiaqi, Zou, Kai, Nie, Ping, Yuan, Fei, Yue, Xiang, Chen, Wenhu
Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable execution, and lack of iterative correction mechanisms. Progress has been constrained by narrow datasets and benchmarks that emphasize single-round generation and single-language tasks. To address these challenges, we introduce three complementary resources for advancing visualization coding agents. VisCode-Multi-679K is a large-scale, supervised dataset containing 679K validated and executable visualization samples with multi-turn correction dialogues across 12 programming languages. VisPlotBench is a benchmark for systematic evaluation, featuring executable tasks, rendered outputs, and protocols for both initial generation and multi-round self-debug. Finally, we present VisCoder2, a family of multi-language visualization models trained on VisCode-Multi-679K. Experiments show that VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4.1, with further gains from iterative self-debug, reaching 82.4% overall execution pass rate at the 32B scale, particularly in symbolic or compiler-dependent languages.
TALLMesh: a simple application for performing Thematic Analysis with Large Language Models
De Paoli, Stefano, Fawzi, Alex
Thematic analysis (TA) is a widely used qualitative research method for identifying and interpreting patterns within textual data, such as qualitative interviews. Recent research has shown that it is possible to satisfactorily perform TA using Large Language Models (LLMs). This paper presents a novel application using LLMs to assist researchers in conducting TA. The application enables users to upload textual data, generate initial codes and themes. All of this is possible through a simple Graphical User Interface, (GUI) based on the streamlit framework, working with python scripts for the analysis, and using Application Program Interfaces of LLMs. Having a GUI is particularly important for researchers in fields where coding skills may not be prevalent, such as social sciences or humanities. With the app, users can iteratively refine codes and themes adopting a human-in-the-loop process, without the need to work with programming and scripting. The paper describes the application key features, highlighting its potential for qualitative research while preserving methodological rigor. The paper discusses the design and interface of the app and outlines future directions for this work.
Codebook Reduction and Saturation: Novel observations on Inductive Thematic Saturation for Large Language Models and initial coding in Thematic Analysis
De Paoli, Stefano, Mathis, Walter Stan
This paper reflects on the process of performing Thematic Analysis with Large Language Models (LLMs). Specifically, the paper deals with the problem of analytical saturation of initial codes, as produced by LLMs. Thematic Analysis is a well-established qualitative analysis method composed of interlinked phases. A key phase is the initial coding, where the analysts assign labels to discrete components of a dataset. Saturation is a way to measure the validity of a qualitative analysis and relates to the recurrence and repetition of initial codes. In the paper we reflect on how well LLMs achieve analytical saturation and propose also a novel technique to measure Inductive Thematic Saturation (ITS). This novel technique leverages a programming framework called DSPy. The proposed novel approach allows a precise measurement of ITS.
Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian
This paper proposes a test to perform Thematic Analysis (TA) with Large Language Model (LLM) on data which is in a different language than English. While there has been initial promising work on using pre-trained LLMs for TA on data in English, we lack any tests on whether these models can reasonably perform the same analysis with good quality in other language. In this paper a test will be proposed using an open access dataset of semi-structured interviews in Italian. The test shows that a pre-trained model can perform such a TA on the data, also using prompts in Italian. A comparative test shows the model capacity to produce themes which have a good resemblance with those produced independently by human researchers. The main implication of this study is that pre-trained LLMs may thus be suitable to support analysis in multilingual situations, so long as the language is supported by the model used.
AcademiaOS: Automating Grounded Theory Development in Qualitative Research with Large Language Models
AcademiaOS is a first attempt to automate grounded theory development in qualitative research with large language models. Using recent large language models' language understanding, generation, and reasoning capabilities, AcademiaOS codes curated qualitative raw data such as interview transcripts and develops themes and dimensions to further develop a grounded theoretical model, affording novel insights. A user study (n=19) suggests that the system finds acceptance in the academic community and exhibits the potential to augment humans in qualitative research. AcademiaOS has been made open-source for others to build upon and adapt to their use cases.
Using Large Language Models to Support Thematic Analysis in Empirical Legal Studies
Drápal, Jakub, Westermann, Hannes, Savelka, Jaromir
Thematic analysis and other variants of inductive coding are widely used qualitative analytic methods within empirical legal studies (ELS). We propose a novel framework facilitating effective collaboration of a legal expert with a large language model (LLM) for generating initial codes (phase 2 of thematic analysis), searching for themes (phase 3), and classifying the data in terms of the themes (to kick-start phase 4). We employed the framework for an analysis of a dataset (n = 785) of facts descriptions from criminal court opinions regarding thefts. The goal of the analysis was to discover classes of typical thefts. Our results show that the LLM, namely OpenAI's GPT-4, generated reasonable initial codes, and it was capable of improving the quality of the codes based on expert feedback. They also suggest that the model performed well in zero-shot classification of facts descriptions in terms of the themes. Finally, the themes autonomously discovered by the LLM appear to map fairly well to the themes arrived at by legal experts. These findings can be leveraged by legal researchers to guide their decisions in integrating LLMs into their thematic analyses, as well as other inductive coding projects.